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ABSTRACT 

Degenerate splice site sequences mark the intron 
boundaries of pre-mRNA transcripts in multicellular 
eukaryotes. The essential pre-mRNA splicing factor 
U2AF^^ is faced with the paradoxical tasks of accur- 
ately targeting polypyrimidine (Py) tracts preceding 
3' splice sites while adapting to both cytidine and 
uridine nucleotides with nearly equivalent 
frequencies. To understand how U2AF^^ recognizes 
degenerate Py tracts, we determined six crystal 
structures of human U2AF®^ bound to cytidine- 
containing Py tracts. As deoxy-ribose backbones 
were required for co-crystallization with these Py 
tracts, we also determined two baseline structures 
of U2AF^° bound to the deoxy-uridine counterparts 
and compared the original, RNA-bound structure. 
Local structural changes suggest that the 
N-terminal RNA recognition motif 1 (RRM1) is more 
promiscuous for cytosine-containing Py tracts than 
the C-terminal RRM2. These structural differences 
between the RRMs were reinforced by the 
specificities of wild-type and site-directed mutant 
U2AF®^ for region-dependent cytosine- and 
uracil-containing RNA sites. Small-angle X-ray scat- 
tering analyses further demonstrated that Py tract 
variations select distinct inter-RRM spacings from 
a pre-existing ensemble of U2AF^^ conformations. 
Our results highlight both local and global conform- 
ational selection as a means for universal 3' splice 
site recognition by U2AF®^. 



INTRODUCTION 

Pre-mRNA splicing removes non-coding introns and 
regulates most human transcripts (1); however, the mech- 
anisms by which the splice sites are identified and 
regulated are not well understood. Splicing fidelity is 
assisted by an ATP-dependent series of checkpoints 
during the assembly of >100 proteins and five small 
nuclear (sn)RNAs into the active spliceosome [reviewed 
in (2)]. Consensus sequences mark the 5' and 3' splice 
sites at the intron-exon boundaries of the pre-mRNA. 
Nevertheless, to be distinguished and regulated in a 
specific manner, the 'weak' regulated splice sites of multi- 
cellular organisms often deviate from the optimum con- 
sensus of 'strong' constitutive splice sites (3). The 
spliceosome is directed to the 3' splice site of the 
pre-mRNA by a polypyrimidine (Py) tract located 
between the branchpoint sequence and an AG dinucleo- 
tide (Figure 1 A). The Py tract exemplifies the variabihty of 
human splice site signals. Cytidines precede the 3' splice 
site at frequencies approaching those of uridine (~35 and 
45%, respectively) (4,5), yet cytidine and uridine confer 
different Py tract activities. Increasing the number of 
uridines in a Py tract generally increases use of an 
adjacent 3' splice site (6-8). Isolated cytidine nucleotides 
in a uridine-rich Py tract can support or even enhance the 
use of an adjacent 3' splice site (6,9), yet several consecu- 
tive cytidines can abolish detectable splicing (7,9). 
Mutations that compromise recognition of splice site 
signals can be lethal or lead to genetic diseases and 
cancer [reviewed in (10,11)], such as shortening of a cftr 
Py tract that is responsible for some cases of cystic fibrosis 
(12,13). Consequently, the spliceosomal proteins and 
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Figure 1. Crystal structures of U2AF' bound to diverse Py tracts. (A) Splicing factors recognizing the 3' splice site. A pictogram of the primate Py 
tract consensus derived from (4) is inset below. The C- to N-terminal orientation of the U2AF''^ domains recognize the 5'-3'- direction of the 
pre-mRNA. BPS, branchpoint sequence; Py tract, polypyrimidine tract; AG dinucleotide. The U2AF'^ RRMs are numbered in sequence and 'U' 
indicates the C-terminal UHM. The U2AF'^ domains and inter-RRM linker sequences are given in Supplementary Figure SI. (B) The nomenclature, 
Py tract sequences, resolution limits and crystallographic Rfaoiors are listed for the dU2AF'^l,2 crystal structures. The binding registers of the Py 
tracts are ahgned relative to the seven nucleotide binding sites of the original dU2AF'''l,2/rUi2 structure (PDB ID 2G4B). Electron density maps 
such as those shown in Supplementary Figure S2 were used to assign the Br-dU positions. Py tracts of distinct copies in the crystallographic 
asymmetric unit are listed separately (ohgonucleotides 'P' and 'E'; (Br-dU5)dCl also includes oligonucleotides "H' and 'K'). Alternative binding 
registers marked V/' and 'ft' are italicized. Nucleotides marked with grey font lack interpretable electron density and were not included in the final 
structures. Supplementary Table SI details the crystallographic data and refinement statistics. 
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snRNAs must overcome the challenge of recognizing de- 
generate signal sequences embedded within pre-mRNAs 
that are thousands of nucleotides long. 

The U2 small nuclear ribonucleoprotein auxihary factor 
65kDa (U2AF") universally recognizes degenerate Py 
tracts preceding 3' splice sites during the early stages of 
pre-mRNA splicing (14-17) (Figure lA), and there facili- 
tates ATP-dependent association of the U2 snRNP (14). 
U2AF*^ is essential for vertebrate development (18), and 
specific U2AF*^ deficiencies are associated with cystic 
fibrosis (19), myotonic dystrophy (20) and several 
cancers (21-23). U2AF*^ recognizes the Py tract via two 
central RNA recognition motifs (RRMl and RRM2) 
(17,24) (Figure lA, Supplementary Figure SI A). Our 
structure of the core RRMl and RRM2-containing 
domain of U2AF^^ bound to an optimal poly-uridine 
(poly-rU) RNA (25) reveals that U2AF" forms specific 
hydrogen bonds with the edges of the uracil bases, 
including the N3 and 04 atoms where cytosine differs 
from uracil. Subsequent NMR characterizations con- 
firmed the interactions of the individual U2AF^^ RRMs 
with uridines in solution (26), although the inter-RRM 
arrangements differ from the solution configuration 
owing to an internal deletion of the inter-RRM linker 
that was required for crystallization (27). The observation 
of apparently specific hydrogen bonds between U2AF^^ 
and the uracil bases served to highlight the question: 
how can U2AF^^^ universally target the diverse Py tracts 
of multicellular eukaryotes? 

To address this question, here we report six crystal 
structures of human U2AF^^ bound to cytosine- 
containing Py tracts. We also determined and compare 
the original RNA-bound structure with two crystal struc- 
tures of the deoxy-uridine (dU) containing counterparts, 
as deoxy-ribose backbones were required for co- 
crystallization of these Py tracts. Structural differences, 
coupled with the region-dependent Py tract affinities of 
U2AF^^ and site-directed mutants and small-angle X-ray 
scattering (SAXS) data of U2AF^^ complexes with various 
Py tracts, suggest that U2AF^^ adapts to degenerate splice 
site signals through conformational selection of a promis- 
cuous N-terminal RRMl and a stringent C-terminal 
RRM2. 



MATERIALS AND METHODS 

Crystallographic analyses 

Given that 7 of 12 co-crystallized uridines were observed 
in the original structure (27), we obtained new crystal 
forms of a U2AF'^^ variant (dU2AF'^^l,2, including 
residues 148-237 and 258-336) bound to minimal, 
seven-nucleotide sites and marked by 5-bromo-uridines 
(Br-dU) to define the oligonucleotide-binding registers 
(Figure IB, Supplementary Table SI, Supplementary 
Figure S2). A deoxy-ribose backbone for the uracil bases 
was required to co-crystallize the 7-mer oligonucleotides; 
the U2AF^^ protein binds deoxy-ribonucleotides with rea- 
sonable affinity (25,28) and contacts only a single 2' 
hydroxyl group at the 5' terminus of the poly-rU RNA 
(25). We previously demonstrated that the U2AF*'^ 



variant (dU2AF 1,2, including residues 148-237 and 
258-336) used for crystallization exhibits similar RNA 
affinities, splicing efficiencies and protein/RNA contacts 
as the unmodified U2AF*^ counterpart (25,27,29). The 
interdomain linker is poorly conserved (Supplementary 
Figure SIB), and residues 238-257 can be replaced with 
unrelated sequences without penalizing the affinity or 
ability of U2AF^^ to support pre-mRNA splicing (29). 

Human U2AF^^ constructs were expressed and purified 
as described (27). Structures were determined by molecu- 
lar replacement using the RRMs of PDB ID 2G4B as 
search models and refined using REFMAC5 (30). 
Purified DNA oligonucleotides were purchased from 
Integrated DNA Technologies, Inc. Protein and DNA 
concentrations were estimated using the respective absorb- 
ance at 280 nm or 260 nm and calculated molar extinction 
coefficients (31,32). Protein (20mg/mL) and DNA were 
mixed in 1:1.2 molar ratio with 4mM [N,N'-Bis(3-D- 
gluconamidopropyl) deoxycholamide]. Crystals were 
grown by the hanging drop method against a reservoir 
solution of 1.5-1.6 M ammonium sulfate, 10% dioxane 
and 0.1 M 2-(N-morpholino)ethanesulfonic acid (pH 6.0) 
at 4°C and flash cooled foUowing stepwise transfer to 21% 
glycerol as the cryoprotectant. Data collection and refine- 
ment statistics are reported in Supplementary Table SI. 

Surface plasmon resonance experiments 

Surface plasmon resonance (SPR) experiments were 
completed as described previously (25) with minor modifi- 
cations. To reduce steric effects, a 14-atom hnker separated 
the 5' biotin used for immobilization from the 5' phosphate 
of the ohgonucleotide. As the on/off rates approach the 
limit for reliable measurement, the average equilibrium 
response at each concentration was fit to a steady-state 
binding model. Repeated washes with the running buffer 
were sufficient to regenerate the surface following each 
injection. At least two independent experiments were 
repeated for each oligonucleotide/protein combination, 
and the average KqS are reported in Table 1. 
Representative sensorgrams and binding curves are 
shown in Supplementary Figure S6. Isothermal titration 
calorimetry (ITC) experiments described in 
Supplementary Figure S7 independently confirmed the 
preference of U2AF"1,2 to bind 5'-4rU over 3'-4rU RNAs. 

SAXS analyses 

Following size exclusion chromatography to prepare 
monodisperse samples, the appropriate protein: RNA 
stoichiometrics were verified by the absorbance ratios of 
the final SAXS samples at 280 nm and 260 nm. SAXS 
samples were free of interparticle effects based on the 
agreement of scattering and Guinier plots over a range 
of concentrations (Supplementary Figure S8). Ensembles 
of 20 different PDB structure files were fit using the 
program EOM (33) and significantly improved the 
values over single models (Supplementary Table S2). As 
the 13-nucleotide Py tracts contribute only 11% of the 
total scattering mass, the SAXS data primarily reflect 
the protein conformations. The rigid body models are 
composed of the RRMl and RRM2 structures from 
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PDB ID 2G4B and U2AF Homology Motif (UHM) 
structure from PDB ID lOPI connected by ab initio 
linkers. 

RESULTS 

Structures of human U2AF^^ bound to 
cytidine-containing sites 

To understand how U2AF^^ adapts to cytidines, we 
determined eight structures of dU2AF^^l,2 bound to 
ohgonucleotides, six of which contained single cytidines 
(Figure IB, Supplementary Table SI). We previously 
determined the structure of a U2AF^ variant 
(dU2AF"l,2 lacking residues 238-257 of the inter-RRM 
hnker) bound to a 12-mer poly-rU RNA (vVn) (25). The 
dU2AF*^l,2 protein is required for crystallization (27) 
and exhibits similar RNA affinities, splicing efficiencies 
and protein/RNA contacts as the wild-type U2AF''^1,2 
counterpart (25-27,29). To distinguish cytidine from 
uridine, we co-crystallized dU2AF^^l,2 with the core 
deoxy-uridine (dU) 7-mer with the sequence register 
marked by 5-Br-dU (Supplementary Figure S2). A DNA 
backbone for the uracil bases was required for 
co-crystallization of these shorter ohgonucleotides with 
U2AF". The U2AF*^ protein binds a 20-mer dU oligo- 
nucleotide with a 4-7-fold apparent decrease in affinity 
relative to the RNA counterpart, consistent with a 
single 2' hydroxyl contact with the 5' terminal nucleotide 
in the crystal structure of dU2AF*^l,2 bound to poly-rU 
RNA (25). 

The structures of two baseline Br-dU-containing 
complexes (Br-dU3 and Br-dU5) were determined at 2.5 
and 2.2 A resolution, respectively, for comparison with 
the cytidine-containing counteiparts. Based on the omit 
electron density maps, the Br-dU3 strands exhibited alter- 
native conformations, i.e. a mixed population of the two 
binding registers with the Br-dU in both the fourth and fifth 
rather than the third binding sites (Figure IB, 
Supplementary Figure S2A). The slipped binding register 
enables the halogen substituent to participate in favourable 
aromatic stacking interactions with U2AF^^ side chains 
(34) (Figure S2I) and at the fifth position, is near positively 
charged K225 and R227 residues (Figure 4A). The intro- 
duction of cytidines altered the apparent preference for 
U2AF''^ to bind Br-dU at the fourth or fifth sites, in 
most cases by stabilizing one of the two pre-existing alter- 
native conformations (Figure IB). The altered binding 
registers of the cytidine-containing oligonucleotides, 
coupled with local changes at the individual sites described 
later in the text, support the hypothesis that certain 
U2AF^^ sites can accommodate cytidines more readily 
than others. Based on these observations, we subsequently 
focused on oligonucleotides containing Br-dU at the fifth 
position of the sequence (Br-dU 5). For all complexes of 
dU2AF*^l,2 with Br-dU 5-containing oligonucleotides, the 
bromine appeared at the expected fifth nucleotide binding 
site (Supplementary Figure S2E). 

Each structure offers two crystallographically independ- 
ent views of the protein/ohgonucleotide complex 
(Figure 2A), with the exceptions of four copies for the 



A 




Figure 2. Overall structures of U2AF recognizing Py tracts. 
(A) Overall packing of the two U2AF''^/Py tract complexes in the crys- 
tallographic asymmetric units. Protein 'A' (blue) respectively binds the 
5' and 3' regions of oligonucleotides 'E' and 'P' (magenta); Protein 'B' 
(green) respectively binds the 5' and 3' regions of oligonucleotides 'P' 
and symmetry-related 'E'^^"" (pink). The boxed complex is expanded in 
B. (B) Representative structure of dU2AF'^l,2 (chain "A') bound to the 
Br-dU5 ohgonucleotide. The four nucleotides contacting each RRM are 
coloured magenta and numbered as discussed in the text. The central, 
fourth nucleotide contacts both RRMl and RRM2 by crystallographic 
symmetry as shown in A. Top, view into the face of RRMl bound to 
the 3' region in an identical orientation as the boxed polypeptide in A; 
Bottom, view into the face of RRM2 bound to the 5' region following 
2-fold rotation about the ;'-axis. 



dU2AF''^l,2/(Br-dU5jdCl structure and one copy for 
the original dU2AF ^I,2/rUi2 structure. The backbone 
conformations of the individual RRMl or RRM2 
domains were nearly identical among the structures 
(rmsd 0.25-0.35 A between Ca atoms of the RRMs), 
whereas the relative RRM positions differ (rmsd 6-7 A 
between Ca atoms of the overall polypeptide chains) 
(Supplementary Figure S3A). The overall domain ar- 
rangements are likely to be influenced by the internal 
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Figure 3. Structural comparison of U2AF'^ RRM2 recognizing uracil versus cytosine. (A) First nucleotide binding sites. The dUl and dU2 of 
Br-dU5 oligonucleotides 'E' correspond to the binding sites of the RNA counterpart and are shown in A and B. The distinct binding sites of dUl 
and dU2 from the Br-dU5 'P' oligonucleotides are shown in Supplementary Figure S4A and B. SigmaA-weighted 2|F(,|-|F(.| electron density maps are 
contoured at 1 a. The dCl is disordered and lacks apparent electron density in all four ohgonucleotides ('£', 'H', 'K' and T') of the (Br-dU5)dCl 
structure. The neighbouring dU2 of (Br-dU5)dCl oUgonucleotide 'P' also is disordered. (B) Second nucleotide binding sites. One example of cytidine 
in the second binding site of oligonucleotide 'E' is available from the (Br-dU5)dC2 structure. The two alternative conformations of dC2-E are shown 
in separate panels for clarity. The distinct cytidine binding site of (Br-dU5)dC2 oligonucleotide 'P' is shown in Supplementary Figure S4C and is 
nearly identical for (Br-dU3)dC2-P. (C) Third nucleotide binding sites. The U2AF''^ interactions are nearly identical with the dU3 nucleotide of the 
Br-dU5 structure and corresponding dU2 nucleotide of Br-dU3 structure in both oligonucleotides 'E' and 'P'. One example of cytidine in the third 
binding site is available from oligonucleotide 'E' of the (Br-dU3)dC2 structure. Supplementary Movie SI illustrates the cytidine-induced changes at 
the third binding site. In Figures 3 and 4, the uridines are coloured magenta, cytidines are yellow, and grey lines show the nucleotides adjoining the 
site of interest. Polar hydrogens are included for clarity. Dashed lines indicate hydrogen bonds (<3.5A between non-hydrogen atoms). 



deletion within inter-RRM linker that is required for crys- 
tallization and hence also differ from the NMR-based 
model for the wild-type U2AF^^1,2 bound to a 9-mer 
poly(rU) RNA (26). Nevertheless, the nucleotide inter- 
actions by the individual RRMs in the crystal structures 
agree with the solution data (26) and as such offer the 
means to illustrate the local U2AF^^ interactions with 
cytidine-substituted Py tracts at high resolution. 

The dU4-dU7 of both complexes and dUl-dU4 in one of 
the two complexes (oligonucleotide 'E') in the crystallo- 
graphic asymmetric unit correspond to the dU2AF^^l,2- 
bound rUi2-conformation (rmsd of matching 04, N3, 02 
atoms is 0.48 A for dU4-E-<lU7-E/rU4-rU7, 0.50 A for 
dU4-P-<iU7-P/rU4^rU7 and 0.73 A rmsd for dUl-E- 



dU4-E/rU4^rU7) (Supplementary Figure S3B and D). The 
dU 1 and dU2 positions of the other crystallographically in- 
dependent complex (oligonucleotide 'P' in Figure 2) differ 
from oligonucleotide 'E' and the RNA counterpart 
(rmsd is 6.4 A for dUl-P - dU4-P/rUl - rU4 compared 
with 0.42 A for dU3-P - dU4-P/rU3 - rU4) 
(Supplementary Figure S3C). The altered positions of the 
dUl-P and consequently dU2-P nucleotides likely arise 
from weakened interactions in the absence of the rUl 5' 
terminal phosphate and hydroxyl group (Supplementary 
Figure S4). Apart from these differences in the first and 
second nucleotides of oligonucleotide T', U2AF^^ recog- 
nizes the dUs in a similar manner as the uridines of the 
rUi2 RNA (Figures 3^, Supplementary Figure S5). 
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Figure 4. Structural comparison of U2AF RRMl recognizing uracil versus cytosine. (A) Fifth nucleotide binding sites. The U2AF interactions 
are nearly identical among the Br-dU5 nucleotides from oligonucleotides 'E' and 'P' of the Br-dU5 structure and the corresponding dU4 nucleotides 
of the Br-dU3 structure. One example of cytidine in the fifth binding site is available from oligonucleotide 'P' of the (Br-dU3)dC4 structure. 
Supplementary Movie S2 illustrates the cytidine-induced changes at the fifth binding site. (B) Sixth nucleotide binding sites. The U2AF'^ interactions 
are nearly identical among the dU6 nucleotides from oligonucleotides 'E' and 'P' of the Br-dU5 structure and the corresponding dU5 nucleotides of 
the Br-dU3 structure. The dC6 interactions of the (Br-dU5)dC6 oligonucleotides "E' and 'P' differ shghtly in the rotation of the R150 side chain and 
are shown in separate panels. As shown in C, the movement of R150-A in turn alters dU7-P in the seventh binding site of the (Br-dU5)dC6 
oligonucleotide 'P'. Supplementary Movie S3 illustrates the cytidine-induced changes at the sixth binding site of (Br-dU5)dC6 oligonucleotide "P". (C) 
Seventh nucleotide binding sites. Cytidine is bound in the seventh binding sites of both oligonucleotides in the (Br-dU3)dC5 structure. The final panel 
compares the cytidine-induced shift in the dU7 binding site of the (Br-dU5)dC6 oligonucleotide 'P'. 



The C-terminal RRM2 of each polypeptide recognizes 
four nucleotides in the 5' region of one oligonucleotide 
(dUl-dU4), and the N- terminal RRMl recognizes four 
nucleotides in the 3' region of a separate oligonucleotide 
(dU4-dU7) (Figure 2). Consequently, the central dU4 nu- 
cleotide is enclosed by RRMl and RRM2 contributed by 
distinct polypeptides (Supplementary Figure S5). Despite 
differences in the relative rotation of the two RRMs 
among the crystallographically independent complexes, 
the central dU4 base continues to engage in similar 



U2AF^^ contacts among the structures. The K260, N289 
and F304 residues that interact with rU4 (here dU4) con- 
tribute significantly to the U2AF*^1,2 affinity for poly-rU 
RNA (by 36-fold for K260A/N289A double mutation and 
73-fold for F304A mutation) (25). Given the importance 
of these interactions and the similar interactions, despite 
crystallographically independent environments, we 
suggest that the rU4/dU4 interactions are the coalescence 
of two separable binding sites for nucleotides on U2AF''^ 
RRM 1 and RRM2 such that the crystal structure in effect 
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represents eight distinct nucleotide binding sites on the 
U2AF^^ protein. Accordingly, in-depth analyses of the 
U2AF^^1,2 affinities for uridine tracts of various lengths 
suggest a binding site size of 8-9 nucleotides (26). Efforts 
to substitute a cytidine at the fourth binding site of the 
crystal structures were unsuccessful, possibly owing to the 
dual sets of interactions at this site. Instead, we have 
determined structures that seek to place cytidines at six 
of the U2AF^^/nucleotide binding sites. 

The U2AF^^ RRM2 structure has difficulty 
accommodating cytidines in the 5' region of the Py tract 

The structures of dU2AF*^l,2 bound to either 
(Br-dU5)dCl, (Br-dU5)dC2 or (Br-dU3)dC2 oligonucleo- 
tides suggest that the U2AF*^ RRM2 has difficuhy 
forming stable interactions with cytidine substitutions in 
the 5' half of the bound ohgonucleotide (Figure 3). Based 
on the 2.2 A resolution Br-dU5 structure, the dUl of the 
oligonucleotide 'E' binds a similar site as rUl of the 
original 2.5 A resolution structure of the dU2AF*^l,2/ 
rUi2 structure (Figure 3A). A bound water molecule 
that mediates a hydrogen bond between dUl-E and 
T296 of the Br-dU5 structures is likely to be present but 
unresolved at the lower, 2.5 A resolution of the 
dU2AF^^l,2/rUi2 complex, given that the 'hydrogen 
bond' between T296 and rUl of the RNA-bound 
complex is relatively long (3.4 A). As described in more 
depth for the second site, a K328 side chain that engages 
the 5' phosphate of the rUl nucleotide has shifted in the 
Br-dU5 structure to interact with the uracil-04 of the 
neighbouring dU2. With the intention of placing a 
cytidine at the first binding site, we determined the struc- 
ture of dU2AF'^^l,2 bound to a Br-dU5 variant with a 
single cytidine substituted at the first nucleotide 
[(Br-dU5)dCl)] at 2.5 A resolution (Figure 3A, 
Supplementary Table SI). The four complexes in the crys- 
tallographic asymmetric unit of the (Br-dU5)dCl struc- 
ture exhibited the Br-dU in the expected fifth binding 
site (Figure 2B, Supplementary Figure S2F). Rather 
than attempting to adapt to the differences between 
uracil and cytosine (N3-H and 04 compared with N3 
and N4-H atoms, respectively), the first cytidine (dCl) 
became disordered and lacked apparent electron density 
in any of the four complexes (Figure 3A). In addition, 
electron density for the adjacent dU2 nucleotide could 
not be interpreted in one of the dU2AF"l,2/ 
(Br-dU5)dCl copies (ohgonucleotide 'P'). Although 
U2AF^^ interactions with the 5' terminal deoxy-ribose nu- 
cleotides are weakened by the absence of the 2' hydroxyl 
group and terminal phosphate, comparision of the dis- 
ordered dCl and evident dUl suggests that U2AF*^ has 
more difficulty forming stable contacts with a cytosine 
than uracil at the first position of the Py tract. One 
possible explanation is that the negative charge of the 
D293 side chain is less favourable adjacent a 
cytosine-N3 than a uracil-N3-H. 

To capture a cytidine in the second binding site, we 
determined the 2.4 and 2.2 A resolution structures of 
dU2AF*^l,2 bound to the (Br-dU5)dC2 and (Br-dU3)dC2 
oligonucleotides, respectively (Figure 3B, Supplementary 



Table SI). Both copies of the Br-dU5)dC2 oligonucleotides 
in the crystallographic asymmetric unit bound the Br-dU in 
the fifth and dC2 in the second binding sites of U2AF^^ 
(Supplementary Figure S2G), whereas only the ohgonucleo- 
tide 'P' of the (Br-dU3)dC2 structure placed the dC2 in the 
second binding site. The rU2 of the dU2AF*^l,2/rUi2 struc- 
ture chiefly interacts by base stacking with glycines G264 
and G265 and a single hydrogen bond between the 
uracil-04 and the K329 side chain. The U2AF^^ interactions 
with the dU2 counterpart of Br-dU5 oligonucleotide 
'E' remained in a similar location as the rU2, except that 
in the absence of prior engagement by the 5' phosphate, the 
K328 side chain replaces the hydrogen bond of K329 with 
the uracil-04. The movement of K329 appears to facilitate 
formation of a water-mediated hydrogen bond between the 
polypeptide backbone and the uracil-04 in the higher reso- 
lution Br-dU5 structure. In the presence of cytosine, the 
K328 side chain of the (Br-dU5)dC2 oligonucleotide 'E' 
shifted slightly to avoid the major conformation of the 
cytosine exocyclic amine. In the minor alternative conform- 
ation of the (BrU5)C2 ohgonucleotide 'E', the cytosine 
moved to accept a hydrogen bond from the backbone 
N-H of the glycine residue (G265) (Figure 3B, far right). 
The shifts in the dC2 positions in turn disrupt the U2AF ^ 
interactions with the adjacent dUl, which is disordered and 
lacks interpretable electron density in any of the structures 
with cytidine bound at the second U2AF^^ site. The dC2 of 
the (Br-dU5)dC2 and (Br-dU3)dC2 'P' oligonucleotides 
rotated too stack against the L330 side chains yet otherwise 
maintained similar interactions as the major conformation 
of oligonucleotides 'E' (Supplementary Figure S4C). 
Although these structural changes at the 5' teiminus of 
the ohgonucleotide cannot be assumed to parallel the 
RNA counterparts, the observed loss of stable U2AF^^ 
interactions suggests that the proximity of a positively 
charged K328 or K329 lysine could favour a uracil-04 
over a cytosine-N4H3 at the second position of the Py tract. 

As aforementioned, the binding register of one of the 
(Br-dU3)dC2 complexes (ohgonucleotide 'P') places a 
cytosine in the second site, where it engages in similar 
U2AF^^ interactions as the (Br-dU5)dC2 oligonucleotide 
'P'. The binding register for the other (Br-dU3)dC2 
complex (ohgonucleotide 'E') places the cytidine in the 
third and the Br-dU in the fourth of the U2AF^^ 
binding sites (Figure IB, Supplementary Figure S2B), ap- 
parently stabilizing one of the two alternative binding 
registers of the parental Br-dU3 structure. Among the 
structures, only the dU2AF"l,2/(Br-dU3)dC2 oligo- 
nucleotide 'E' bound the Br-dU in the third rather than 
fourth or fifth sites, suggesting that the energetic penalty 
for fitting a cytidine into the third site is on par with 
shifting a Br-dU to a less preferable site. 

The cytidine at the third site induces relatively large, 
well-ordered changes in the U2AF^^ structure 
(Figure 3C, Supplementary Movie SI). The U2AF''-^ inter- 
actions are indistinguishable between the rU3 of the 
original, RNA-bound structure and the dU3 from either 
the 'E' or 'P' oligonucleotides of the Br-dU5 structure. 
Namely, the Q333 side chain and A335 carbonyl respect- 
ively engage the uracil-04 and N3-H in hydrogen bonds. 
Three structural changes in the U2AF^^ protein are 
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observed in response to the uracil-to-cytosine substitution: 
(i) tlie Q333 side chain undergoes a torsional rotation to 
accept a hydrogen bond from the cytosine exocychc 
amine; (ii) the backbone carbonyl of R334 moves to 
accept a second hydrogen bond from this cytosine 
amine; and (in) the relatively rigid backbone of S336 
undergoes a large <\> torsional rotation from —142° to 
+ 164°, which enables the peptide bond with A335 to 
donate a hydrogen bond to the cytosine-N3. As the 
A335 and S336 residues are located at the C-terminus of 
the crystallized construct of U2AF*^, the energetic penalty 
for adjusting to the third cytidine could be greater and 
possibly influence downstream residues in the context of 
the full-length U2AF" protein. 

The U2AF^' RRMl structure adapts to cytidines in the 
3' region of the Py tract 

The structures of dU2AF^^l,2 bound to either 
(Br-dU3)dC4, (Br-dU5)dC6 or (Br-dU3)dC5 ohgonucleo- 
tides suggest that following relatively subtle structural 
changes, U2AF" RRMl can accommodate cytidine sub- 
stitutions in the 3' half of the bound oligonucleotide 
(Figure 4). One of the 2.5 A resolution dU2AF'^^l,2/ 
(Br-dU3)dC4 complexes (oligonucleotide 'P') offered a 
well-defined view of a cytidine in the fifth binding site 
(Figure 4A, Supplementary Figure S2C). In the second 
dU2AF''^l,2/(Br-dU3)dC4 complex of the crystallo- 
graphic asymmetric unit (ohgonucleotide 'E'), alternative 
binding registers place the cytidine with partial occupancy 
in the fifth and sixth binding sites as observed for the 
Br-dU3 parent (Figure IB, Supplementary Figure S2C). 
This stabilized binding register for one of the complexes 
suggests a slight preference for U2AF^^ to accommodate a 
cytosine at the fifth over sixth sites. Despite the introduc- 
tion of a 5-bromo group, the U2AF^^ interactions with the 
fifth dU of both 'P' and 'E' ohgonucleotides in the baseline 
Br-dU5 structure remain similar to the RNA-bound struc- 
ture. Namely, the backbone carbonyl of U2AF*^ R228 
and the amide of H230 respectively form hydrogen 
bonds with the uracil N3-H and 02 groups. Despite 
these nucleotide interactions with a relative rigid protein 
backbone, U2AF*^ readily accommodates the switch to 
the N3 hydrogen bond acceptor of the cytosine at the 
fifth site. A slight movement of the deoxy-cytidine 
relative to the uridine enables the U2AF*^ R228 
carbonyl to accept an analogous hydrogen bond from 
the cytosine exocyclic amine, and the H230-NH - 
cytosine-02 interaction remains unperturbed (Figure 4A, 
Supplementary Movie S2). An ordered water molecule 
bound to the uracil-04 also appears to have been lost, 
possibly owing to the switch from a hydrogen bond 
acceptor to a donor for the cytosine exocyclic amine in 
proximity to the U2AF^^ K225 and R227 side chains. 

U2AF*^ undergoes a number of structural changes to 
adapt to a cytosine at the sixth binding site (Figure 4B, 
Supplementary Movie S3). The Br-dU occupies the fifth 
and cytidine the sixth binding sites in both complexes of 
the dU2AF"l,2/(Br-dU5)dC6 structure (Supplementary 
Figure S2H), which is the highest resolution among the 
dU2AF^^l,2 structures (1.9 A). The dU6 in both T' and 



'E' ohgonucleotides of the Br-dU5 structure both exhibit 
similar U2AF*^ interactions as the RNA counterpart. The 
uracil-04 accepts hydrogen bonds from the backbone 
N-H of U2AF''^ D231 and H230, the latter of which is 
bifurcated by the hydrogen bond donated to the preceding 
dU6-02. The slightly higher resolution of the Br-dU5 
compared with the rUi2 structure (2.2 A and 2.5 A reso- 
lution, respectively) reveals an ordered water molecule 
that mediates hydrogen bonds among R150, D231 and 
the uracil-N3-H of both 'P' and 'E' ohgonucleotides. In 
addition, the R150 side chains of both the 'P' and 'E' 
ohgonucleotides have rotamers that can donate two 
hydrogen bonds to the uracil-02 of dU6. 

Following substitution of the cytosine at the sixth site, 
the D231 side chain rotates to accept a direct hydrogen 
bond from the cytosine exocyclic amine (Figure 4B). The 
introduction of an ordered water molecule enables the 
U2AF''^ D231 peptide N-H to donate a water-mediated 
hydrogen bond to the cytosine exocychc amine. 
Remarkably, RNA binding selects from two alternative 
conformations of the R150 side chain that are apparent 
in the high resolution structure of the apo-U2AF*^ RRMl 
(35). The sixth uracil base is only observed to interact with 
one of two alternative R150 conformations, whereas the 
hydrogen bond pattern of a cytosine is compatible with 
either. Accordingly, the sixth cytosine selects a different 
alternative conformation in each of the two crystallo- 
graphically distinct dU2AF'^^l,2/(Br-dU5)dC6 copies. In 
one of the two (Br-dU5)dC6 complexes (ohgonucleotide 
'E'), the R150 remains in a similar location as when bound 
to uracil, where the side chain contributes two direct 
hydrogen bonds to the cytosine-02 and a water-mediated 
hydrogen bond to the cytosine-N3. The R150 side chain 
of the other dU2AF"l,2/(Br-dU5)dC6 copy (oligonucleo- 
tide 'P') has rotated to replace the water molecule bound 
to the cytosine-N3. This movement of the R150 side chain 
alters stacking with the adjacent dU7 base, which in turn 
shifts to form a distinct set of apparently favourable inter- 
actions with U2AF*^, including a direct hydrogen bond 
from the uracil N-H to the available oxygen of the D231 
side chain and an indirect hydrogen bond via the water 
molecule bound to the preceding cytosine amine 
(Figure 4C, far right panel). Together, these structural 
changes indicate that cytidine is compatible with a 
tight network of U2AF^^ interactions at the sixth 
binding site. 

Lastly, the dU2AF'^V(Br-dU3)dC5 structure provides 
2.2 A resolution views of a cytidine in the seventh 
binding site (Supplementary Table SI). The binding regis- 
ters of both dU2AF"l,2/(Br-dU3)dC5 complexes shifted 
to place the Br-dU5 in the fourth and dC5 in the final 
seventh site (Figure IB, Supplementary Figure S2D). 
The stabilization of this binding register compared with 
the mixture of Br-dU3 at the fourth and fifth sites of the 
parent suggests a shght preference for cytosine over uracil 
at the seventh binding site. By comparison with other 
binding sites, the interactions of U2AF''^ with nucleotides 
in the seventh site are minimal. The seventh uracils of all 
the poly-rU-containing structures primarily interact by 
aromatic stacking on the R150 side chain of U2AF'''^. At 
the higher resolutions of the Br-dU5 and (Br-dU3)dC5 
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Table 1. U2AF affinities for degenerate Py tracts 
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"The U2AF''^1,2 protein comprises the core RRMl and RRM2 domains connected by the wild-type linker, and the U2AF^M,2"^'^ has four 
mutations in RRM2 (D293N/K329Q/L331K/Q333E). 

''The nucleotide substitutions to be compared are emphasized in bold font. 

''Apparent equilibrium dissociation constants for the indicated U2AF''^1,2 protein/Py tract combination are averages of at least two experiments, and 
a representative sensorgram is shown in Supplementary Figure S6. Affinities determined by ITC are given in parentheses, and representative 
isotherms are shown in Supplementary Figure S7. Splicing activities of the rC(lO), rC(7,10) and rC(4-ll) substrates were reported previously (9). 
''Fold Penalty" is the fold decrease in affinity relative to wild-type U2AF''^1,2 binding the rAdML^ Py tract. >3-fold penalties are highlighted 
in bold. 



Structures compared with the rU^ counterpart (2.2 A 
versus 2.5 A resolutions), a water-mediated hydrogen 
bond between the SI 47 side chain and the uracil-04 or 
cytidine-N4H2 becomes apparent. Following a shght 
shift in the position of an intermediary water molecule, 
the cytidine at the seventh site of the dU2AF*^l,2/ 
(Br-dU3)dC5 structure otherwise continues to stack with 
the R150 in a comparable manner as the uracil. 

U2AF^^ prefers uridines in the 5' region of the Py tract 
and tolerates cytidines in the 3' region 

The different abihties of the U2AF^^ RRM structures 
to adapt to cytidines predicted that U2AF*^ would 
display region-dependent preferences for cytidines and 
uridines in the Py tract. To facilitate interpretation of 
the crystal structures, we focused on determining the 
affinities of the minimal U2AF''-^ RRM1-RRM2 domain 
(U2AF*^1,2) for core Py tracts derived from the prototyp- 
ical adenovirus major late promoter transcript 
(rAdMLi3). Our primary method of steady-state SPR 
has the advantage of concurrently monitoring the 
apparent binding stoichiometry (Supplementary Figure 
S6, see 'Materials and Methods' section). ITC experiments 
independently corroborated the specificities determined by 
SPR (Supplementary Figure S7). 

Ad ML variants with a few cytidine substitutions in the 
Py tract previously were shown to support in vitro splicing 
and spliceosome complex formation at levels comparable 



with those of the unmodified pre-mRNA parent, whereas 
a polycytidine tract abohshes detectable use of the 
adjacent 3' splice site (9). We first investigated whether 
these splicing activities correlated with the abilities of 
U2AF'^*1,2 to bind the Py tract RNAs (Table 1). In agree- 
ment with the previously reported splicing activities, the 
affinities of U2AF*^1,2 for Py tracts with one, two and 
even three cytidine mutations at the fifth, seventh, ninth or 
tenth positions remained similar as for the rAdMLn 
ohgonucleotide. We also found that U2AF^^1,2 exhibited 
httle or no detectable binding to a polyC tract of equiva- 
lent length [rC(4-ll)]. 

We next compared tracts of three, four or five uridines 
located in either the 5' or 3' regions of a Py tract otherwise 
coinposed of cytidines (Table 1). In all cases, U2AF*^ 
preferred uridines to cytidines in the 5' region of the Py 
tract, whereas cytidines could effectively substitute for 
uridines in the 3' region. Consistent with the observation 
that four nucleotides bind to each U2AF'^^ RRM and 
considering the shared fourth nucleotide at the RRMl/ 
RRM2 interface, it is not surprising that the U2AF''^^1,2 
affinity for a tract of three uridines in the 5' region of the 
Py tract decreased shghtly relative to four or five uridines. 
The specificity of U2AF^'^1,2 protein was greatest for the 
four-uridine tract (5-fold preference for four consecutive 
uridines in the 5' as opposed to 3' regions of an otherwise 
cytidine-rich Py tract, respectively named 5'-4rU and 
3'-4rU RNAs, Table 1). Given that the U2AF" RRMl 
and RRM2 respectively binds near the 3' and 5' termini of 
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the Py tract, the region-dependent uridine/cytidine prefer- 
ences agreed with the relative ease for the mAF^^ RRMl 
structure to accommodate cytidines in comparison with 
the RRM2. 

Residues in U2AF^^ RRM2 contribute to region-dependent 
pyrimidine specificity 

We proceeded to test the direct involvement of the 
U2AF'^^ RRM2 in determining the specificity of U2AF" 
for uridines in the 5' region of the Py tract. Based on 
the structures, we introduced four mutations within 
the RRM2 of U2AF"1,2 (U2AF^^l,2'^"'r) that were 
intended to favour cytidine over uridine binding 
(Figure 5A and B): (i) D293N, replacing an aspartate 
that would disfavour a cytosine-N3 at the first site with 
an asparagine side chain, which is expected to form similar 
hydrogen bonds with cytosine or uracil; (ii) K329Q, 
hkewise substituting a 'neutral' side chain in place of 
a lysine that contributes an unfavourable electrostatic 
environment for a cytidine-N4H2 at the second site; 
(iii) L331K, changing a leucine to a hydrogen bond 
donor that could potential interact with a cytidine-N3 at 
the second site; and (iv) Q333E, replacing a glutamine 
with a stringent hydrogen bond acceptor for interactions 
with a cytidine-N4H2 at the third site. The U2AF*^1,2'^'^'^ 
variant continued to bind the 5'-4rU Py tract RNA with 
indistinguishable affinity as the wild-type protein. 



indicating that the RRM2 mutations did not detectably 
penalize recognition of uridines in the 5' region of the Py 
tract. As expected based on the crystal structures, the 
U2AF*^1,2^'^ affinity for the 3'-4rU RNA increased by 
5.5-fold to a level comparable with the 5'-4rU counterpart, 
indicating that the mutated RRM2 had lost the ability to 
discriminate against cytidines in the 5' region of the Py 
tract (Figure 5C and D). These results directly implicated 
RRM2 in determining the preference of U2AF'^^ for 
uridines in the 5' region of the Py tract. 

Given the difficulties of predicting RRM-RNA inter- 
actions (36), we did not attempt to enhance the specificity 
of a promiscuous U2AF'^^ RRMl. However, several ob- 
servations indirectly implicated RRMl in recognizing the 
3' region of the Py tract, including the following: (i) the 
U2AF^^1,2 affinity for these Py tracts is higher than 
expected for RRM2 alone, whereas the inter-RRM 
Hnker is not directly involved in RNA binding (26,29), 
leaving only RRMl; (ii) the U2AF''^^1,2 affinities for Py 
tracts with identical pyrimidine contents depended on 
the 5' versus 3' locations of the four uridine tract; and 
(iii) the mutations in RRM2 only affect association with 
the 3'-4rU RNA but have no effect on binding the 5'-4rU 
RNA. These strong region-dependent effects call for re- 
spective contributions by U2AF^^ RRM2 in binding the 
5' nucleotides and RRMl in binding the 3' nucleotides of 
the 13-mer Py tracts. 



U2AFS51 ,2 RRM2 
bound to uridines 




B IVIodel of U2AF8M ,2"^^ Rr|v|2 
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D293N 






Ka(10^ ivi-^) 



Ka(10^ ivi-^) 



5'-4r 



3'-4 




5'-4rU 



3"-4rU 




Figure 5. U2AF'''' RRM2 determines region-dependent uridine/cytidine preferences in Py tracts. (A) Structure of dU2AF''^l,2 RRM2 bound to 
dUl-dU4 (from Br-dU5). The unmodified counterparts of the mutated residues are shown as ball-and-stick representations and coloured by atom: 
carbons, cyan; oxygens, red; nitrogens, blue. (B) Model of U2AF''^1,2""'^ RRM2 showing the predicted interactions of the four mutated D293N/ 
K329Q/L331K/Q333E residues with cytidines (yellow) (C and D) Bar graphs of the 5'-4rU and 3'-4rU (shaded) RNAs affinities for the (C) wild-type 
U2AF"l,2 protein (blue) compared with the (D) D293N/K329Q/L331K/Q333E U2AF''^1,2""'^ protein (yellow). 
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Figure 6. Py tract sequences redistribute tlie U2AF conformational ensemble. (A) Schematic representation of the U2AF 12 domain and Py tract 
RNA sequences used for SAXS experiments. (B) Experimental SAXS profiles (coloured circles) compared with profiles calculated froin the best 
fitting ensembles (black lines). Intensity data are offset for clarity. Colours are consistent throughout: rUu, magenta; rAdMLu, green; rC(7,9,10), 
orange. The SAXS profile of apo-U2AF'^12" ('apo', blue) from reference (38) is shown for comparison. (C) The Dmax distributions of ensembles 
calculated using EOM (33) (solid coloured fines) for U2AF''^12"^ alone or bound to the rUi3 RNA compared with the randomized starting pool 
(dashed grey line). Lower case letters mark the Dmax values of the models shown in E. (D) The Dmax distributions of U2AF'^12"^ ensembles bound 
to rU|3, rAdMLn or rC(7,9,10) Py tract RNAs. (E) Representative models from the EOM analyses and corresponding Dmax values. 



Py tract variants select different U2AF conformations 

The apparent difference in the specificities for the two 
U2AF^^ RRMs increased the possibiUty that U2AF^^ 
adapts to degenerate Py tracts by modulating the 
proximities of a promiscuous RRMl and stringent 
RRM2 domains. The prior NMR methods indicated 
that a 9-mer poly-rU RNA selects an 'open' U2AF^^1,2 
conformation with a side-by-side arrangement of the 
RRMs. In the absence of RNA, the 'open' U2AF^^1,2 
conformation was suggested to be minor relative to a 
major 'closed' conformation, in which the RNA binding 
surface of RRMl is masked by RRM2 (26). Although 
broader than the distribution detected by NMR 
methods, we also observe a range of solution conform- 
ations for the apo-U2AF"l,2 protein by SAXS (37). 
Here, we further investigated the influence of binding Py 
tracts that have distinct cytidine compositions on the dis- 
tribution of U2AF^^ conformations (Figure 6). To ensure 
that the scattering data primarily reflect the protein rather 



than the RNA conformations, we used a larger U2AF 
construct (U2AF'^^1,2'^) that included the C-terminal 
UHM in addition to the RRMl and RRM2 
(Figure 6A). The U2AF" UHM lacks detectable RNA 
affinity in isolation (39), and NMR spectra suggest that 
the U2AF^'^1,2 contacts the Py tract in a similar manner in 
the presence of the UHM domain (26). We also have 
shown that additional residues surrounding the core 
U2AF''^1,2 fragment have httle detectable influence on 
the conformational ensemble (37). As such, we use the 
U2AF^^1,2^ protein as a means to study the influence of 
different Py tract RNA sequences on the U2AF*^ inter- 
domain proximities in the solution pool. We compared the 
effects of three representative Py tracts on the distribution 
of U2AF*^1,2'^ conformations. These Py tracts include a 
homogeneous 13-mer uridine tract (rUn), the unmodified 
AdML Py tract composed of an eight-uridine core 
embedded within five cytidines (rAdML^) and the 
rC(7,9,10) variant with three internal cytidine substitu- 
tions. Al\ three RNAs share a 13-nucleotide length and 
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comparable affinities for U2AF , despite the different 
number of cytidines (Figure 6A). 

Tlie SAXS data sets for U2AF^^1,2" witli and without 
RNA appear monodisperse and extend beyond 
q = 0.30 A~' (Figure 6B, Supplementary Figure S8). In 
the absence of RNA, the average solution shape of the 
apo-U2AF^^l,2^ protein comprises three lobes corres- 
ponding to the relatively separated RRMl, RRM2 and 
UHM domains (38). We re-analysed the 
apo-U2AF''^l,2^ SAXS data using an ensemble 
approach (33), in which a subset of 20 structures that 
best fit the data was selected from a starting pool of 
10000 conformations comprising the RRMl, RRM2 
and UHM structures tethered in randomized orientations 
and proximities by ab initio linkers. As observed for the 
apo-U2AF*^l,2 (37), the distribution of selected 
apo-U2AF*^l,2^ conformations closely matches that of 
the randomized starting pool (Figure 6C). Although no 
change in the average molecular dimensions was 
apparent following addition of the Py tract RNAs (38) 
(Supplementary Table S2), the ensemble analyses 
revealed distinct changes in the distribution of the 
solution conformations (Figure 6C-E). The distribution 
of the U2AF*^1,2'^ conformations following association 
with the homogeneous rUn tract remained broad, but 
the molecular dimensions of the most prevalent conform- 
ations were slightly larger than for the apo-protein 
(~120A compared with 100 A). The conformational 
ensemble of the rAdMLu-bound U2AF'^^1,2" also 
remained broad, but the most prevalent selected conform- 
ations shifted to back to a compact arrangement of the 
U2AF^^ RRMs (-100 A). The cytidine-interrupted 
rC(7,9,10) tract increased the prevalence of more 
extended U2AF''-^1,2" conformations (-150-1 80 A) but 
lacked clear preference for a conformation of any given 
size. In summary, the rAdMLi3, rUi3 and rC(7,9,10) Py 
tracts select subsets of U2AF^^ conformations with 
distinct inter-RRM proximities from the pre-existing 
solution ensemble of the apo-protein. 



DISCUSSION 

Here, a series of high resolution U2AF^^ structures 
suggest that a subset of binding sites in the N-terminal 
RRM 1 can tolerate cytidine substitutions of uridine-rich 
Py tracts more readily than others in the C-terminal 
RRM2 (Figures 3 and 4). We note that the two 
deoxy-cytidines in the 5' terminal positions of the 
RRM2-bound ohgonucleotide are poorly ordered and 
could be influenced by the minimal backbone. However, 
the increased disorder of the cytosines relative to the 
uracils of the dU counterparts is likely to reflect 
weakened interactions that can be extended to U2AF*^ 
recognition of Py tract RNAs. Accordingly, fine broaden- 
ing of the NMR signals for U2AF*'^ bound to poly(rU) 
suggest some flexibihty in the RNA interface (26). In light 
of the C-to-N-terminal orientation of the U2AF^^ RRMs 
bound to the 5'-3' orientation of the bound RNA strand, 
the discrimination of U2AF''^ against cytidines in the 5' 
region of Py tracts and permissiveness towards cytidines in 



the 3' region supports the structural conclusion that 
RRM2 has more difficulty adapting to cytidines than 
does RRMl (Table 1). Site-directed mutagenesis 
confirms a role for the U2AF^^ RRM2 in specifying 
uridines near the 5' region of the Py tract and indirectly 
implicates RRMl in adapting to the 3' region (Figure 5). 
Complementary SAXS studies further demonstrate that 
Py tracts with different cytidine contents select U2AF''^ 
conformations with different inter-RRM spacings from 
the solution ensemble (Figures 6 and 7). 

The prevalent conformations detected by SAXS of the 
apo-U2AF^^l,2^ protein are hkely to correspond to the 
'closed' NMR model, with the distinction that the SAXS 
analyses provides evidence that elongated conformations 
beyond the radius of the paramagnetic relaxation 
enhancement labels contribute to the solution ensemble 
(37). Given that the molecular dimensions of the 'open' 
and 'closed' U2AF^^ models are comparable at the reso- 
lution of the SAXS analyses, the most prevalent conform- 
ations of the U2AF*^l,2'^/AdMLi3 complex are hkely to 
correspond to 'open', side-by-side conformations binding 
the central eight uridines of the AdMLi3 RNA compar- 
able with the NMR model of the U2AF*^l,2/rU9 complex 
(26). We suggest that the increased molecular dimensions 
of the dominant U2AF*^l,2^/rUi3 conformations arises 
from the availability of distal RRM binding sites along 
the longer uridine tract. The U2AF*^l,2"/rC(7,9,10) 
ensemble lacks a dominant conformation, which is con- 
sistent with a model in which the U2AF*^ RRM2 specifies 
the remaining three contiguous uridines, whereas RRMl 
is compatible with a number of cytidine-containing 
binding sites near the 3' end of the Py tract sequence 
(e.g. two redundant CUCC motifs at nucleotides 7-10 or 
10-13). 

Together, the U2AF^^ structures and binding prefer- 
ences reported here support and refine a recent model 
for U2AF''^ multi-domain conformational selection of 
pre-mRNA splice sites, which is based on elegant NMR 
data and biochemical experiments (26). A main feature of 
this model is that a minor 'open' conformation of 
apo-U2AF*^ is selected by Py tract binding, whereas a 
major, 'closed' conformation, in which the RRMl RNA 
binding surface is occluded by interactions with the 
a-helical 'back' of RRM2 and hence unavailable for 
RNA binding. Based on the structural and affinity data 
that we present here and in reference (37), we propose 
three key revisions of this model to better explain the 
abihty of U2AF" to recognize degenerate pre-mRNA 
splice sites. First, we note that the RRM masked by the 
'closed' conformation of apo-U2AF^^l,2 is promiscuous 
for RNA sequences. This observation suggests that the 
'closed' conformation could protect U2AF^^ against 
non-specific RNA interactions and hence inappropriate 
splice site activation. Second, as shown here for the case 
of R 150 at the seventh nucleotide binding site, local con- 
formational selection of pre-existing alternative side chain 
conformers can contribute to the abihty of U2AF^^ to 
adapt to cytidine substitutions. Third, we suggest that 
the two-state model could be an oversimplification of 
the U2AF^^ conformations available for selection by de- 
generate Py tracts. Conformations similar to the 'open' 
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Figure 7. Model for conformational selection of U2AF RRM 1 -RRM2 by diverse Py tracts. Selection of apo-U2AF 1,2 conformations by binding 
the rUi3, rAdMLi3 and rC(7,9,10) Py tracts used in this study. Representative conformations of tandem RRMs from the selected ensembles are 
shown with space-filling graphics. Although linear RNA sequences are shown for clarity, the 3' splice site is likely to exhibit curvature in 
three-dimensions (38). 



and 'closed' NMR-based U2AF conformations contrib- 
ute to the solution ensembles detected by X-ray scattering. 
However, both of these NMR structures are characterized 
by close contacts between the two U2AF^^ RRMs, which 
are insufficient to describe the X-ray scattering data of the 
apo-U2AF'^^ 1,2 protein (37). Instead, the apo-U2AF"l,2 
and U2AF^^1,2 scattering data are better fit by broad 
conformational ensembles closely resembling the 
randomized starting pools of RRM (and UHM) 
domains connected by ab initio hnkers. This subtle dis- 
crepancy over the relative populations of compact 
'closed'/'open' and more extended conformations 
suggests that didactic studies are needed to better under- 
stand the outcomes of PRE and SAXS techniques when 
applied to multi-domain proteins. Regardless, distinct 
U2AF^^ conformations with increased molecular dimen- 
sions are enriched in the ensembles bound to the homoge- 
neous rUi3 or three-uridine rC(7,9,10) Py tracts. This 
finding emphasizes that the U2AF^-^ RRMl and RRM2 
can participate independently in identifying 3' splice site 
sequences, which increases the diversity of Py tracts that 
can be recognized by U2AF''^. 

In summary, we propose that degenerate Py tracts se- 
lect distinct proximities of a promiscuous RRMl and 
stringent RRM2 from the apo-U2AF^^ conformational 
ensemble, which would shift interspersed cytidines to 
permissive U2AF*^ sites without penalizing RNA 
affinity (Figure 7). Although this result emphasizes con- 
formational selection over an induced fit mechanism of 
U2AF^^/RNA recognition, the apparently weak 



association between the RRMs could faciUtate 
'fine-tuning' of initial U2AF^^ interactions with the 
RNA. This revised model for U2AF^^ recognition of 
diverse metazoan splice sites clarifies prior site-specific 
cross-hnking analyses that demonstrated broad, 
overlapping binding sites for U2AF*^ RRMl and 
RRM2 along the Py tracts of pre-mRNAs (40). In light 
of the crystal and SAXS structures, the range of cross- 
hnking patterns arises from the independent adjustment 
of the U2AF'^^ RRMl or RRM2 binding registers along 
a degenerate Py tract. We note that U2AF*^ can accom- 
modate up to five consecutive cytidines in a Py tract 
without altering RNA affinity or pre-mRNA splicing 
(Table 1) (6,7,9). By contrast, continuous stretches of cyti- 
dines saturate the cytidine-compatible binding sites of 
U2AF^^. As such, these cytidines are forced to engage 
the stringent U2AF^^ sites, which strongly inhibits 
U2AF''^-dependent RNA binding and pre-mRNA 
splicing. 

Our results highhght the ability of a promiscuous 
RRM 1 and specific RRM2 to independently seek compat- 
ible binding sites as key factors for human U2AF*^ to 
recognize degenerate splice site signals. As these studies 
are focused on the human U2AF^^ homologue, it 
remains possible that a 'closed' U2AF*^ conformation 
plays a greater role in organisms with short, uridine-rich, 
consensus Py tracts at the 3' splice sites, such as 
Caenorhcibditis elegans (41) or Saccharomyces cerevisiae 
(42). In humans, this model for U2AF''^ - pre-mRNA 
splice site recognition is likely to have important 
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implications for disease-associated mutations. For 
example, shortening in the length of the Py tract preceding 
a splice acceptor site of the cystic fibrosis transmembrane 
conductance regulator (cftr) gene is the most common 
defect in men with cystic fibrosis and infertility owing to 
congenital bilateral absence of the vas deferens (13). 
Normal phenotypes are produced by Py tracts that have 
stretches of 9 Us (UUUUUUUUUAACAG) and 7 Us (U 
GUUUUUUUAACAG), whereas splicing of the 
associated 3' splice site is nearly abolished by shortening 
of the Py tract to 5 Us (UGUGUUUUUAACAG) (12). 
Based on the results presented here, the TG expansion in 
the 5' region of the cftr Py tract is hkely to interfere with 
the sequence-specific association of the U2AF^^ RRM2. 
Future studies wiU illuminate further roles for conform- 
ational selection in spliceosome assembly at regulated 
pre-mRNA splice sites and its consequences for human 
genetic disease. 
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